It can sometimes be a little challenging to figure out specifically how to address different vulnerability classes in Python. This article addresses one of the top finding categories found in Python, CWE 117 (also known as CRLF Injection), and shows how to use a custom log formatter to address the issue. We’ll use this project, which deactivates or deletes user accounts from the Veracode platform, to illustrate the functionality.
The vulnerability
CWE 117 (sometimes classified as CWE 93) is (normally, see note below) a medium severity finding that compromises the integrity of logging information by allowing an attacker to insert extra log statements, corrupt the logs so that they become unreadable, or even inject malicious code into the logs (useful if the log will be read through a web user interface). The attacker does this by inserting data containing carriage return and line feed (CRLF) characters, causing the appearance of a new logging statement.
Note on classification: CWE 93 refers to a broader set of weaknesses with handling content containing CRLF characters. It applies to logs and also to HTTP headers (CWE 113), sending email messages, or any output format where carriage returns and line feeds are significant characters; CWE 117 is the log-specific version of it. This article focuses specifically on issues where CRLF injection occurs in a logging context (CWE 117).
Example
This code snippet is vulnerable to CRLF injection:
import logging
import sys
import anticrlf
logger = logging.getLogger(__name__)
logging.basicConfig(level=logging.DEBUG, stream=sys.stderr)
... # additional logger setup
dangerous_value = "This line splits\r\nthe log entry by including CRLF"
logger.warn("The value of dangerous_value is {}".format(dangerous_value))
# WARNING:__main__:The value of dangerous_value is This line splits
# the log entry by including CRLF
# Note how the above ^ makes two lines, messing up log integrity
The fix
Before we get into the fix, it’s worth noting that not every application has a strong requirement for log integrity – a local command line script may not require as much attention to this vulnerability category as a system where auditing is a requirement and that takes input from multiple users. See also the note on severity below.
Assuming that log integrity is important for your application (and in most cases it probably is), the strategy for fixing CRLF injection vulnerabilities is to sanitize all user inputs, ensure that you use a consistent character encoding throughout the application (to avoid problems from canonicalization), and escape output. Dealing with the first two issues is beyond the scope of this article, but applying an output escaping strategy is pretty straightforward by using a logging formatter. For the purposes of this blog, we’ll use logging-formatter-anticrlf from Veracode Research; see the Alternatives section for some other approaches you could take.
The logging-formatter-anticrlf
library functions as a drop-in logging formatter, but it escapes carriage returns and line feeds in the output. Darren’s readme shows how to use the library for stream-based logging; the project above shows an example of using it with logging to a file. Here’s how:
-
First, we install
logging-formatter-anticrlf
usingpip install logging-formatter-anticrlf
. -
We import
logging
andanticrlf
. -
We set up the logger (this routine is called from within the
if __name__ == '__main__':
block at the bottom or the file):
def setup_logger():
handler = logging.FileHandler('vcoffboard.log', encoding='utf8')
handler.setFormatter(anticrlf.LogFormatter(''%(levelname)s:%(name)s:%(message)s''))
logger = logging.getLogger(__name__)
logger.addHandler(handler)
logger.setLevel(logging.INFO)
-
We define a variable for the instance of
logger
that we configured, and make sure that all our logging statements call this rather than callinglogger
directly. Importantly, we repeat this in any helper files to make sure they’re all using the same log file configuration:
# assume setup_logger() has been called once for this __name__
logger = logging.getLogger(__name__)
...
dangerous_value = "This line splits\r\nthe log entry by including CRLF"
logger.warn("The value of dangerous_value is {}".format(dangerous_value))
# WARNING:__main__:The value of dangerous_value is This line splits\r\nthe log entry by including CRLF
# Note that the the CR and LF are escaped so the log entry is correctly all on one line
And that’s it! As long as you call the logger using the log variable, it will format the logs with our anticrlf
formatter and escape the log output correctly.
Alternatives
What else could we do to fix CWE 117 findings in Python? We have a couple of options:
-
Explore other available libraries that do encoding for logs. Unfortunately, there don’t seem to be a lot available that specifically address the logging context such that you can simply use them as a drop-in replacement for an existing logger.
-
Explicitly encode all logging statements. You can certainly explicitly encode every string that you send in a logging statement using Python’s
encode()
or a similar function. Usingencode()
has the advantage of not requiring any additional libraries, but the disadvantage of requiring you to remember to apply the fix for every logging statement. The advantage of the library discussed in this post is that it functions as a drop-in; once you configurelogging
to use it as a formatter, the rest of your code can function as normal and you don’t need to remember to perform explicit encoding. -
Take a different path. One way to avoid the problem entirely might be to switch to using
syslog
or another external logging system that does not use CRLF to start a new log message. This will be a more substantial change and will require its own secure coding considerations, but avoids this particular flavor of log injection.
Summary
It’s common to see large numbers of CWE 117 findings in an application; logging is a common operation for many applications and every log statement has a potential vulnerability to CRLF injection. Applying a strategy such as the one described here allows you to quickly address lots of findings in a fairly straightforward way.
Notes
Severity: As with any vulnerability category, the severity of any individual example of a CWE 117 finding may be higher or lower, depending on the business requirements of the application; specifically, for some applications, log integrity may be less critical than others.